<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <title>What is Berkeley DB?</title>
    <link rel="stylesheet" href="gettingStarted.css" type="text/css" />
    <meta name="generator" content="DocBook XSL Stylesheets V1.73.2" />
    <link rel="start" href="index.html" title="Berkeley DB Programmer's Reference Guide" />
    <link rel="up" href="intro.html" title="Chapter 1.  Introduction" />
    <link rel="prev" href="intro_terrain.html" title="Mapping the terrain: theory and practice" />
    <link rel="next" href="intro_dbisnot.html" title="What Berkeley DB is not" />
  </head>
  <body>
    <div xmlns="" class="navheader">
      <div class="libver">
        <p>Library Version 12.1.6.2</p>
      </div>
      <table width="100%" summary="Navigation header">
        <tr>
          <th colspan="3" align="center">What is Berkeley DB?</th>
        </tr>
        <tr>
          <td width="20%" align="left"><a accesskey="p" href="intro_terrain.html">Prev</a> </td>
          <th width="60%" align="center">Chapter 1.  Introduction </th>
          <td width="20%" align="right"> <a accesskey="n" href="intro_dbisnot.html">Next</a></td>
        </tr>
      </table>
      <hr />
    </div>
    <div class="sect1" lang="en" xml:lang="en">
      <div class="titlepage">
        <div>
          <div>
            <h2 class="title" style="clear: both"><a id="intro_dbis"></a>What is Berkeley DB?</h2>
          </div>
        </div>
      </div>
      <div class="toc">
        <dl>
          <dt>
            <span class="sect2">
              <a href="intro_dbis.html#idm140526460982288">Data Access Services</a>
            </span>
          </dt>
          <dt>
            <span class="sect2">
              <a href="intro_dbis.html#idm140526460880096">Data management services</a>
            </span>
          </dt>
          <dt>
            <span class="sect2">
              <a href="intro_dbis.html#idm140526460973760">Design</a>
            </span>
          </dt>
        </dl>
      </div>
      <p>
        So far, we have discussed database systems in general
        terms. It is time now to consider Berkeley DB in particular
        and see how it fits into the framework we have introduced. The
        key question is, what kinds of applications should use
        Berkeley DB? </p>
      <p>
        Berkeley DB is an Open Source embedded database library
        that provides scalable, high-performance,
        transaction-protected data management services to
        applications. Berkeley DB provides a simple function-call API
        for data access and management. 
    </p>
      <p> 
        By "Open Source," we mean Berkeley DB is distributed under
        a license that conforms to the <a class="ulink" href="http://opensource.org/osd.html" target="_top"> Open Source
        Definition</a>. This license guarantees Berkeley DB is
        freely available for use and redistribution in other Open
        Source applications. Oracle Corporation sells commercial
        licenses allowing the redistribution of Berkeley DB in
        proprietary applications. In all cases the complete source
        code for Berkeley DB is freely available for download and use. 
    </p>
      <p>
        Berkeley DB is "embedded" because it links directly into
        the application. It runs in the same address space as the
        application. As a result, no inter-process communication,
        either over the network or between processes on the same
        machine, is required for database operations. Berkeley DB
        provides a simple function-call API for a number of
        programming languages, including C, C++, Java, Perl, Tcl,
        Python, and PHP. All database operations happen inside the
        library. Multiple processes, or multiple threads in a single
        process, can all use the database at the same time as each
        uses the Berkeley DB library. Low-level services like locking,
        transaction logging, shared buffer management, memory
        management, and so on are all handled transparently by the
        library.
    </p>
      <p> 
        The Berkeley DB library is extremely portable. It runs
        under almost all UNIX and Linux variants, Windows, and a
        number of embedded real-time operating systems. It runs on
        both 32-bit and 64-bit systems. It has been deployed on
        high-end Internet servers, desktop machines, and on palmtop
        computers, set-top boxes, in network switches, and elsewhere.
        Once Berkeley DB is linked into the application, the end user
        generally does not know that there is a database present at
        all. 
    </p>
      <p>
        Berkeley DB is scalable in a number of respects. The
        database library itself is quite compact (under 300 kilobytes
        of text space on common architectures), which means it is
        small enough to run in tightly constrained embedded systems,
        but yet it can take advantage of gigabytes of memory and
        terabytes of disk if you are using hardware that has those
        resources.
    </p>
      <p>
        Each of Berkeley DB's database files can contain up to 256
        terabytes of data, assuming the underlying filesystem is
        capable of supporting files of that size. Note that Berkeley
        DB applications often use multiple database files. This means
        that the amount of data your Berkeley DB application can
        manage is really limited only by the constraints imposed by
        your operating system, filesystem, and physical hardware.
    </p>
      <p> 
        Berkeley DB also supports high concurrency, allowing
        thousands of users to operate on the same database files at
        the same time. 
    </p>
      <p> 
        Berkeley DB generally outperforms relational and
        object-oriented database systems in embedded applications for
        a couple of reasons. First, because the library runs in the
        same address space, no inter-process communication is required
        for database operations. The cost of communicating between
        processes on a single machine, or among machines on a network,
        is much higher than the cost of making a function call.
        Second, because Berkeley DB uses a simple function-call
        interface for all operations, there is no query language to
        parse, and no execution plan to produce.
    </p>
      <div class="sect2" lang="en" xml:lang="en">
        <div class="titlepage">
          <div>
            <div>
              <h3 class="title"><a id="idm140526460982288"></a>Data Access Services</h3>
            </div>
          </div>
        </div>
        <p>
            Berkeley DB applications can choose the storage
            structure that best suits the application. Berkeley DB
            supports hash tables, Btrees, simple record-number-based
            storage, and persistent queues. Programmers can create
            tables using any of these storage structures, and can mix
            operations on different kinds of tables in a single
            application. 
        </p>
        <p>
            Hash tables are generally good for very large databases
            that need predictable search and update times for
            random-access records. Hash tables allow users to ask,
            "Does this key exist?" or to fetch a record with a known
            key. Hash tables do not allow users to ask for records
            with keys that are close to a known key. 
        </p>
        <p> 
            Btrees are better for range-based searches, as when the
            application needs to find all records with keys between
            some starting and ending value. Btrees also do a better
            job of exploiting <span class="emphasis"><em>locality of
            reference</em></span>. If the application is likely to
            touch keys near each other at the same time, the Btrees
            work well. The tree structure keeps keys that are close
            together near one another in storage, so fetching nearby
            values usually does not require a disk access. 
        </p>
        <p>
            Record-number-based storage is natural for applications
            that need to store and fetch records, but that do not have
            a simple way to generate keys of their own. In a record
            number table, the record number is the key for the record.
            Berkeley DB will generate these record numbers
            automatically. 
        </p>
        <p> 
            Queues are well-suited for applications that create
            records, and then must deal with those records in creation
            order. A good example is on-line purchasing systems.
            Orders can enter the system at any time, but should
            generally be filled in the order in which they were
            placed. 
        </p>
      </div>
      <div class="sect2" lang="en" xml:lang="en">
        <div class="titlepage">
          <div>
            <div>
              <h3 class="title"><a id="idm140526460880096"></a>Data management services</h3>
            </div>
          </div>
        </div>
        <p> 
            Berkeley DB offers important data management services,
            including concurrency, transactions, and recovery. All of
            these services work on all of the storage structures. 
        </p>
        <p> 
            Many users can work on the same database concurrently.
            Berkeley DB handles locking transparently, ensuring that
            two users working on the same record do not interfere with
            one another. 
        </p>
        <p>
            The library provides strict ACID transaction semantics,
            by default. However, applications are allowed to relax the
            isolation guarantees the database system makes.
        </p>
        <p> 
            Multiple operations can be grouped into a single
            transaction, and can be committed or rolled back
            atomically. Berkeley DB uses a technique called
            <span class="emphasis"><em>two-phase locking</em></span> to be sure that
            concurrent transactions are isolated from one another, and
            a technique called <span class="emphasis"><em>write-ahead
            logging</em></span> to guarantee that committed changes
            survive application, system, or hardware failures.
        </p>
        <p> 
            When an application starts up, it can ask Berkeley DB
            to run recovery. Recovery restores the database to a clean
            state, with all committed changes present, even after a
            crash. The database is guaranteed to be consistent and all
            committed changes are guaranteed to be present when
            recovery completes.
        </p>
        <p>
            An application can specify, when it starts up, which
            data management services it will use. Some applications
            need fast, single-user, non-transactional Btree data
            storage. In that case, the application can disable the
            locking and transaction systems, and will not incur the
            overhead of locking or logging. If an application needs to
            support multiple concurrent users, but does not need
            transactions, it can turn on locking without transactions.
            Applications that need concurrent, transaction-protected
            database access can enable all of the subsystems.
        </p>
        <p> 
            In all these cases, the application uses the same
            function-call API to fetch and update records. 
        </p>
      </div>
      <div class="sect2" lang="en" xml:lang="en">
        <div class="titlepage">
          <div>
            <div>
              <h3 class="title"><a id="idm140526460973760"></a>Design</h3>
            </div>
          </div>
        </div>
        <p> 
            Berkeley DB was designed to provide industrial-strength
            database services to application developers, without
            requiring them to become database experts. It is a classic
            C-library style <span class="emphasis"><em>toolkit</em></span>, providing a
            broad base of functionality to application writers.
            Berkeley DB was designed by programmers, for programmers:
            its modular design surfaces simple, orthogonal interfaces
            to core services, and it provides mechanism (for example,
            good thread support) without imposing policy (for example,
            the use of threads is not required). Just as importantly,
            Berkeley DB allows developers to balance performance
            against the need for crash recovery and concurrent use. An
            application can use the storage structure that provides
            the fastest access to its data and can request only the
            degree of logging and locking that it needs.
        </p>
        <p>
            Because of the tool-based approach and separate
            interfaces for each Berkeley DB subsystem, you can support
            a complete transaction environment for other system
            operations. Berkeley DB even allows you to wrap
            transactions around the standard UNIX file read and write
            operations! Further, Berkeley DB was designed to interact
            correctly with the native system's toolset, a feature no
            other database package offers. For example, on UNIX
            systems Berkeley DB supports hot backups (database backups
            while the database is in use), using standard UNIX system
            utilities, for example, dump, tar, cpio, pax or even cp.
            On other systems which do not support filesystems with
            read isolation, Berkeley DB provides a tool for safely
            copying files. 
        </p>
        <p> 
            Finally, because scripting language interfaces are
            available for Berkeley DB (notably Tcl and Perl),
            application writers can build incredibly powerful database
            engines with little effort. You can build
            transaction-protected database applications using your
            favorite scripting languages, an increasingly important
            feature in a world using CGI scripts to deliver HTML.
        </p>
      </div>
    </div>
    <div class="navfooter">
      <hr />
      <table width="100%" summary="Navigation footer">
        <tr>
          <td width="40%" align="left"><a accesskey="p" href="intro_terrain.html">Prev</a> </td>
          <td width="20%" align="center">
            <a accesskey="u" href="intro.html">Up</a>
          </td>
          <td width="40%" align="right"> <a accesskey="n" href="intro_dbisnot.html">Next</a></td>
        </tr>
        <tr>
          <td width="40%" align="left" valign="top">Mapping the terrain: theory and
        practice </td>
          <td width="20%" align="center">
            <a accesskey="h" href="index.html">Home</a>
          </td>
          <td width="40%" align="right" valign="top"> What Berkeley DB is not</td>
        </tr>
      </table>
    </div>
  </body>
</html>
